For my final assignment, I chose to reproduce some of the figures that I made this past fall in my Aquatic Ecology course 🐟🌿

Back in October, we conducted research over the course of a few weeks to investigate the impacts of ski development across lotic ecosystems.

  • To do so, we chose two streams that we expected to be impacted by ski development (the stream beside the parking lot of the Snowbowl and the portion of Brandy Brook that runs through Rikert) to compare to a nearby reference stream, Sparks Brook, that we considered to be virtually unaffected by ski infrastructure.

  • Research included collecting data on stream chemistry, tree canopy cover, and water temperature and absorbency.

  • We also conducted benthic invertebrate sampling to investigate if there were any differences in species assemblages between the sites.

This J term, I decided to reproduce the figures that I originally generated in Microsoft Excel on benthic invertebrate assemblages. Here’s what they looked like in Excel (apologies for the poor image quality!):

  • Each figure displays benthic invertebrate counts categorized by taxonomic order across all three sampling sites.

Generating these pie charts required a lot of data manipulation in Excel prior to the actual chart creation. I also had to create each of them separately. I wanted to see if I could do that data manipulation in R to reduce potential for error along the way. You can see my workflow below:

# load packages
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
library(ggthemes)

# load data
raw_insect_data <- read.table(file = "/Users/emilypower/Desktop/BIOL_1007A/Data/BIOL 0304a Ski-Stream Data.csv", header = TRUE, sep=",")
glimpse(raw_insect_data)
## Rows: 78
## Columns: 12
## $ Location <chr> "Snowbowl", "Snowbowl", "Snowbowl", "Snowbowl", "Snowbowl", "…
## $ Species  <chr> "Gomphidae", "Leuctridae", "Chloroperlidae", "Chironomidae", …
## $ Order    <chr> "Odonata", "Plecoptera", "Plecoptera", "Diptera", "Plecoptera…
## $ Count    <int> 1, 1, 1, 1, 1, 2, 1, 3, 2, 3, 3, 1, 4, 1, 1, 2, 1, 2, 1, 1, 2…
## $ X        <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ X.1      <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ X.2      <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ X.3      <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ X.4      <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ X.5      <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ X.6      <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ X.7      <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
head(raw_insect_data)
##   Location                      Species      Order Count  X X.1 X.2 X.3 X.4 X.5
## 1 Snowbowl                    Gomphidae    Odonata     1 NA  NA  NA  NA  NA  NA
## 2 Snowbowl                   Leuctridae Plecoptera     1 NA  NA  NA  NA  NA  NA
## 3 Snowbowl               Chloroperlidae Plecoptera     1 NA  NA  NA  NA  NA  NA
## 4 Snowbowl                 Chironomidae    Diptera     1 NA  NA  NA  NA  NA  NA
## 5 Snowbowl                Peltoperlidae Plecoptera     1 NA  NA  NA  NA  NA  NA
## 6 Snowbowl Unknown (mayfly or stonefly)       <NA>     2 NA  NA  NA  NA  NA  NA
##   X.6 X.7
## 1  NA  NA
## 2  NA  NA
## 3  NA  NA
## 4  NA  NA
## 5  NA  NA
## 6  NA  NA
# clean up data
insect_data <- raw_insect_data[complete.cases(raw_insect_data[,3]),] %>%  # remove counts with unknown taxonomic orders
  group_by(Location, Order) %>%  # group data by taxonomic order within each sampling location
  summarize(countSum=sum(Count)) %>%  # summarize counts by order within each sampling location 
  mutate(freq = scales::label_percent()(countSum / sum(countSum)))  # adds a column ('freq') that represents the percentage of each count within the total counts of a sampling site
## `summarise()` has grouped output by 'Location'. You can override using the
## `.groups` argument.
insect_data
## # A tibble: 17 × 4
## # Groups:   Location [3]
##    Location     Order         countSum freq  
##    <chr>        <chr>            <int> <chr> 
##  1 Rikert       Coleoptera           8 18.2% 
##  2 Rikert       Diptera              8 18.2% 
##  3 Rikert       Ephemeroptera        5 11.4% 
##  4 Rikert       Plecoptera           7 15.9% 
##  5 Rikert       Trichoptera         16 36.4% 
##  6 Snowbowl     Coleoptera           6 16.7% 
##  7 Snowbowl     Diptera              8 22.2% 
##  8 Snowbowl     Ephemeroptera        9 25.0% 
##  9 Snowbowl     Odonata              2 5.6%  
## 10 Snowbowl     Plecoptera           5 13.9% 
## 11 Snowbowl     Trichoptera          6 16.7% 
## 12 Sparks Brook Coleoptera           3 1.57% 
## 13 Sparks Brook Diptera             39 20.42%
## 14 Sparks Brook Ephemeroptera       52 27.23%
## 15 Sparks Brook Odonata              3 1.57% 
## 16 Sparks Brook Plecoptera          43 22.51%
## 17 Sparks Brook Trichoptera         51 26.70%

I chose to use the mutate() function (above) because I couldn’t facet wrap three pie charts together with different sample sizes. However…

Here’s where I ran into an issue. I tried to make a pie chart using insect_data but when I facet wrapped the three sampling sites together, it looked like this:

I think the issue was that I was using freq percentages, a character value, instead of a numeric value, to create the slices of the pie chart. To fix this, I found a way to add a column to my insect_data that included the frequency values without the percentage sign. This new column (freq_numeric) contains numeric values.

insect_data <- insect_data %>%
  mutate(freq = scales::label_percent()(countSum / sum(countSum)))  # create column containing values generated by dividing each countSum by the summary of countSum and converting the resulting decimal to a percentage
head(insect_data)
## # A tibble: 6 × 4
## # Groups:   Location [2]
##   Location Order         countSum freq 
##   <chr>    <chr>            <int> <chr>
## 1 Rikert   Coleoptera           8 18.2%
## 2 Rikert   Diptera              8 18.2%
## 3 Rikert   Ephemeroptera        5 11.4%
## 4 Rikert   Plecoptera           7 15.9%
## 5 Rikert   Trichoptera         16 36.4%
## 6 Snowbowl Coleoptera           6 16.7%
new_freq <- as.numeric(sub("%", "", insect_data$freq))  # create new variable that removes the percentage sign from freq values
final_insect_data <- cbind(insect_data, new_freq)  # add new variable to insect_data
## New names:
## • `` -> `...5`
final_insect_data <- rename(final_insect_data, freq_numeric = ...5)  # name column `freq_numeric`
head(final_insect_data)
## # A tibble: 6 × 5
## # Groups:   Location [2]
##   Location Order         countSum freq  freq_numeric
##   <chr>    <chr>            <int> <chr>        <dbl>
## 1 Rikert   Coleoptera           8 18.2%         18.2
## 2 Rikert   Diptera              8 18.2%         18.2
## 3 Rikert   Ephemeroptera        5 11.4%         11.4
## 4 Rikert   Plecoptera           7 15.9%         15.9
## 5 Rikert   Trichoptera         16 36.4%         36.4
## 6 Snowbowl Coleoptera           6 16.7%         16.7

Calling the freq_numeric column when creating my pie charts worked!

# generate chart
chart <- final_insect_data %>%  # create chart from insect data
  ggplot(mapping = aes(x="", y=freq_numeric, fill=Order)) +  # set y as insect count and fill as insect order
  geom_bar(stat = "identity", width=1) +  # create bar chart
  coord_polar("y", start=0) +  # make pie chart from bar chart
  scale_fill_manual(values = c("indianred3", "sienna2", "darkgoldenrod1", "darkolivegreen4", "steelblue", "plum3")) +  # set fill colors
  theme_void() +  # set chart theme
  theme(legend.position = "right") +  # add legend
  facet_wrap(~Location, ncol=3, strip.position="bottom")  # separate pie charts by sampling location

Here’s the final product!

chart

Fig. 1 Benthic macroinvertebrate abundances categorized by taxonomic order from three different stream sites in Vermont. Snowbowl and Rikert streams were sampled on the same day, Sparks Brook one week later. Samples were taken by students using a Surber sampler and identified using a taxonomic key.

With a little extra curiosity, I wanted to see if there was significant variation in benthic invertebrate assemblages between stream sites. To do so, I decided to re purpose the ANOVA function I created in Weekly Assignment 2. I chose to compare insect taxonomic order assemblages across streams, as opposed to raw counts, because we spent different amounts of time sampling at each stream site. Thus, the counts were more reflective of the time invested than the actual stream composition.

########################################################################
# FUNCTION: ANOVA_function
# returns the p-value of an ANOVA summary table
# input: a data frame
# outputs: p-value
#-----------------------------------------------------------------------
ANOVA_function <- function(df=final_insect_data){
  test <- aov(freq_numeric ~ Location, data=df)  # is there statistically significant variation in insect assemblages between stream sites?
  test_summary <- summary(test)  # makes test summary
  p_value <- test_summary[[1]][["Pr(>F)"]][1]  # pulls p value from test summary
  return(p_value)  # returns p value
}

insect_pValue <- ANOVA_function()
insect_pValue
## [1] 0.8126263

If the code above is set up correctly, the high p-value indicates that stream location had no significant impact on benthic invertebrate assemblages when considering taxonomic order.